An Integrated Tool for Annotating Historical Corpora
نویسندگان
چکیده
E-Dictor is a tool for encoding, applying levels of editions, and assigning part-ofspeech tags to ancient texts. In short, it works as a WYSIWYG interface to encode text in XML format. It comes from the experience during the building of the Tycho Brahe Parsed Corpus of Historical Portuguese and from consortium activities with other research groups. Preliminary results show a decrease of at least 50% on the overall time taken on the editing process.
منابع مشابه
eBonsai: An Integrated Environment for Annotating Treebanks
Syntactically annotated corpora (treebanks) play an important role in recent statistical natural language processing. However, building a large treebank is labor intensive and time consuming work. To remedy this problem, there have been many attempts to develop software tools for annotating treebanks. This paper presents an integrated environment for annotating a treebank, called eBonsai. eBons...
متن کاملGerManC - Towards a Methodology for Constructing and Annotating Historical Corpora
for 'Digital Historical Corpora Architecture, Annotation, and Retrieval' Conference, 03-08 December 2006, Dagstuhl (D) GerManCTowards a Methodology for Constructing and Annotating Historical Corpora Astrid Ensslin, Martin Durrell, Paul Bennett University of Manchester (UK) Our paper focuses on the one hand on the challenges posed by the structural variability, flexibility and ambiguity found in...
متن کاملArabic anaphora resolution: corpora annotation with coreferential links
Annotated resources are much needed for evaluation and training of anaphora resolution systems. The coreferential chain annotation is a difficult task which can not be realised without an appropriate tool. In this paper, we present our work on Arabic corpora annotation with anaphoric links (i.e., the annotation of the identity relation between the anaphors and their antecedents). In particular,...
متن کاملTESLA: A Tool for Annotating Geospatial Language Corpora
In this paper, we present The gEoSpatial Language Annotator (TESLA)—a tool which supports human annotation of geospatial language corpora. TESLA interfaces with a GIS database for annotating grounded geospatial entities and uses Google Earth for visualization of both entity search results and evolving object and speaker position from GPS tracks. We also discuss a current annotation effort using...
متن کاملBECAM tool - a semi-automatic tool for bootstrapping emotion corpus annotation and management
Corpus annotation is an important aspect in speech applications where stochastic models need to be trained and evaluated. Multimodal corpora are also annotated. Moreover, corpus annotation is an essential phase in the construction of emotion recognizer engines. Large corpora, as they are essential to construct representative knowledge bases, have been a problem for corpus annotators. Time consu...
متن کامل